Principal component analysis
A technique to find the axes that explain the largest variation in the data.
See also Singular value decomposition.
People
- Karl Rohe
- PCA with a large matrix is magic: https://twitter.com/karlrohe/status/1475494422028640265
- a one minute data exercise for PCA intuition: https://twitter.com/karlrohe/status/1482022457145974789
Books
How to
Preprocessing
Excerpt:
- sqrt any count features. log any heavy tailed features. PCA prefers things that are “homoscedastic” (which is my favorite word to ASMR and I literally do it in class) sqrt and log are “variance stabilizing transformations”.
- localization is noise. regularize when you normalize.
- if you make a histogram of a component (or loading) vector and it has really big outliers, that is localization. It’s bad. It means the vector is noise.
- diagnostic: https://github.com/karlrohe/LocalizationDiagnostic
- To address localization, I would suggest normalizing by regularized row/column sums. This works like fucking magic. Not even kidding. D_r = Diagonal(1/ sqrt(rs + mean(rs)); D_c = Diagonal(1/ sqrt(cs + mean(cs)). Do SVD on D_r A D_c.
- paper: Zhang2018understanding
- youtube: https://www.youtube.com/watch?v=lOCoa3hYR4Y
- and my favorite rule, the Cheshire cat rule - “One day Alice came to a fork in the road and saw a Cheshire cat in a tree. ‘Which road do I take?’ she asked. ‘Where do you want to go?’ was his response. ‘I don’t know,’ Alice answered. ‘Then,’ said the cat, ‘it doesn’t matter.”
Video
Articles
- A tutorial on Principal Components Analysis by Lindsay I Smith.
- A Tutorial on Principal Component Analysis by Jonathon Shlens
- Principal component analysis by Hervé Abdi and Lynne J
- Michael E and Christopher M, Probabilistic principal component analysis. Journal of the Royal Statistical Society. Series B (Statistical Methodology) (1999) vol. 61 (3) pp. 611-622
- Principal Components Analysis (Advanced Data Analysis from an Elementary Point of View) by Cosma Shalizi
- Robst PCA?
- http://jeremykun.wordpress.com/2012/06/28/principal-component-analysis/
- What is principal component analysis by Lior Pachter
- Points of Significance: Principal component analysis
- This 🧵 gives my first heuristic understanding of PCA
- Making sense of principal component analysis, eigenvectors & eigenvalues
Studies
- Ilin2010practical - missing values